Detection of Outliers in Multivariate Data: A Method Based on Influence Eigen

نویسنده

  • Nazrina Aziz
چکیده

Outliers can be defined simply as an observation (or a subset of observations) that is isolated from the other observations in the data set. There are two main reasons that motivate people to find outliers; the first is the researchers intention. The second is the effects of an outlier on analyses. This article does not differentiate between the various justifications for outlier detection. The aim is to advise the analyst of observations that are considerably different from the majority. This article focuses on the identification of outliers using the eigenstructure of S and S(i) in terms of eigenvalues, eigenvectors and principle component. Note that S(i) is the sample covariance matrix of data matrix X(i), where the subscript i in parentheses is read as ”with observation i removed from X”. The idea of using the eigenstructure as a tool for identification of outliers is motivated by Maximum Eigen Difference (MED). MED is the method for identification of outliers proposed by [1]. This method utilises the maximum eigenvalue and the corresponding eigenvector. It is noted that examination of observations effect on the maximum eigenvalue is very significant. The technique for identification of outliers discuss in this article is applicable to a wide variety of settings. In this article, observations that are located far away from the remaining data are considered to be outliers.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of outliers types in multivariate time series using genetic algorithm

Multivariate time series data, often, modeled using vector autoregressive moving average (VARMA) model. But presence of outliers can violates the stationary assumption and may lead to wrong modeling, biased estimation of parameters and inaccurate prediction. Thus, detection of these points and how to deal properly with them, especially in relation to modeling and parameter estimation of VARMA m...

متن کامل

Local multivariate outliers as geochemical anomaly halos indicators, a case study: Hamich area, Southern Khorasan, Iran

Anomaly recognition has always been a prominent subject in preliminary geochemical explorations. Among the regional geochemical data processing, there are a range of statistical and data mining techniques as well as different mapping methods, which serve as presentations of the outputs. The outlier’s values are of interest in the investigations where data are gathered under controlled condition...

متن کامل

A robust wavelet based profile monitoring and change point detection using S-estimator and clustering

Some quality characteristics are well defined when treated as response variables and are related to some independent variables. This relationship is called a profile. Parametric models, such as linear models, may be used to model profiles. However, in practical applications due to the complexity of many processes it is not usually possible to model a process using parametric models.In these cas...

متن کامل

Application of Recursive Least Squares to Efficient Blunder Detection in Linear Models

In many geodetic applications a large number of observations are being measured to estimate the unknown parameters. The unbiasedness property of the estimated parameters is only ensured if there is no bias (e.g. systematic effect) or falsifying observations, which are also known as outliers. One of the most important steps towards obtaining a coherent analysis for the parameter estimation is th...

متن کامل

A statistical test for outlier identification in data envelopment analysis

In the use of peer group data to assess individual, typical or best practice performance, the effective detection of outliers is critical for achieving useful results. In these ‘‘deterministic’’ frontier models, statistical theory is now mostly available. This paper deals with the statistical pared sample method and its capability of detecting outliers in data envelopment analysis. In the prese...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012